Homework 5 - Documenting an HR database

Author

Alexa Lauer

Published

October 20, 2024

Github repo: https://github.com/cmsc-vcu/cmsc408-fa2024-hw5-alexalauer

Description

The purpose of this database is to manage and organize employee and organizational information within a company, facilitating effective human resource management and decision-making processes. It serves as a centralized system to store comprehensive data about employees, their job roles, departments, and locations, enabling efficient retrieval and reporting. The database comprises nine major tables that capture various aspects of organizational structure and employee details.

The Countries table stores information about countries, linking each to a specific region. The Departments table contains data about different departments, including department names, managers, and locations. The Employees table holds essential employee information, such as names, contact details, job titles, and salary information. The Emp_detail_view provides a comprehensive view of employee data, integrating information from multiple tables to display complete profiles, including job titles and locations. The Jobs table defines job roles and associated salary ranges, while the Job_grades table categorizes these roles into grades based on salary thresholds. The Job_history table tracks employees’ previous positions and departments over time, and the Locations table provides details about physical office locations. Lastly, the Regions table organizes countries into broader regions. This structured setup allows for detailed querying and reporting, making it easy to analyze employee data and departmental performance while supporting the company’s HR strategies.

Chen diagram

The following diagram highlights the major relations in the database.

ER Countries Countries region_of region_of Countries--region_of Departments Departments located_at located_at Departments--located_at Employees Employees works_in works_in Employees--works_in has_job has_job Employees--has_job belongs_to belongs_to Employees--belongs_to Jobs Jobs has_grade has_grade Jobs--has_grade Job_grades Job_grades Job_history Job_history recorded_in recorded_in Job_history--recorded_in held_in held_in Job_history--held_in Locations Locations located_in located_in Locations--located_in Regions Regions works_in--Departments located_at--Locations has_job--Jobs belongs_to--Job_history has_grade--Job_grades located_in--Countries region_of--Regions recorded_in--Departments held_in--Jobs
Figure 1: A Chen Diagram for Employee Management Database

Crows foot diagram

The following diagram provides greater detail on the entities and attributes and highlights the cardinality and participation of the relations.

erDiagram
    %% Define entities and their attributes
    Countries {
        int country_id
        string country_name
        int region_id
    }
    
    Departments {
        int department_id
        string department_name
        int manager_id
        int location_id
    }

    Employees {
        int employee_id
        string first_name
        string last_name
        string email
        string phone_number
        date hire_date
        int job_id
        float salary
        float commission_pct
        int manager_id
        int department_id
    }

    Jobs {
        int job_id
        string job_title
        float min_salary
        float max_salary
    }

    Job_grades {
        string grade_level
        float lowest_sal
        float highest_sal
    }

    Job_history {
        int employee_id
        date start_date
        date end_date
        int job_id
        int department_id
    }

    Locations {
        int location_id
        string street_address
        string postal_code
        string city
        string state_province
        int country_id
    }

    Regions {
        int region_id
        string region_name
    }

    %% Define relationships and cardinality
    Employees ||--o{ Departments : "works in"
    Employees ||--o{ Jobs : "assigned to"
    Employees ||--o{ Job_history : "has"
    
    Departments ||--o{ Locations : "located at"
    Locations ||--o{ Countries : "part of"
    Countries ||--o{ Regions : "in"

    Jobs ||--o{ Job_grades : "has"
    Job_history ||--o{ Departments : "recorded in"
    Job_history ||--o{ Jobs : "held in"
    
    %% Optional: to show views as separate entities
    Emp_detail_view ||--|| Employees : "based on"
    Emp_detail_view ||--|| Jobs : "based on"
    Emp_detail_view ||--|| Departments : "based on"
    Emp_detail_view ||--|| Locations : "based on"
    Emp_detail_view ||--|| Countries : "based on"
    Emp_detail_view ||--|| Regions : "based on"

Listing of tables in the database

The following python blocks highlight the steps used to document the tables in the database. The code is left in the report to show how fun it was to write python!

Load libraries

First import the necessary libraries, os, pandas, sqlalchemy, dotenv, IPython.display, and tabyulate. The code defines a helper function to display a pandas dataframe as a table in a Quarto or Jupyter environment using a markdown-style grid format. This enhances the presentation of data tables by making them more readable and neatly formatted.

import os
import pandas as pd
from sqlalchemy import create_engine, text
from dotenv import load_dotenv
from IPython.display import Markdown
from tabulate import tabulate

def display_dataframe_as_table( df, width="100%" ):
    """ Display dataframe in pretty format for Quarto """
    markdown_table = tabulate(df, headers='keys', tablefmt="grid",showindex=False)
    html_table = markdown_table
    display(Markdown(html_table))

Get credentials

This code loads database credentials (username, password, host, and database name) from a .env file and stores them in a dictionary (config). The dictionary can then be used to establish a connection to a database using these credentials, for example with SQLAlchemy. This approach is secure because it avoids hardcoding sensitive information directly in the code.

# load credentials from file ~/.env to OS environment
load_dotenv()

# use these credentials to populate a dictionary
config = {
  'user': os.getenv("HW5_USER"),
  'password': os.getenv("HW5_PASSWORD"),
  'host': os.getenv("HW5_HOST"),
  'database': os.getenv("HW5_DB_NAME")
}

Open connection to database

This code constructs a connection string for a MySQL database using credentials from the previous config dictionary. It creates an SQLAlchemy Engine for the connection, and also prints a sanitized version of the connection string with the password hidden.

engine_uri = f"mysql+pymysql://{config['user']}:{config['password']}@{config['host']}/{config['database']}"
cnx = create_engine(engine_uri)

engine_uri_for_show = f"mysql+pymysql://{config['user']}:**pwd-goes-here**@{config['host']}/{config['database']}"
print(engine_uri_for_show)
mysql+pymysql://24fa_lauera:**pwd-goes-here**@cmsc508.com/24fa_hr_24fa_lauera

Listing of sample data

To generate a listing of sample data from each table in the HR database, the existing code from explore.qmd is modified to retrieve the first few records instead of schema information. The function show_table is defined to accept a table name as a parameter. Inside this function, the SQL query is updated to execute SELECT * FROM {table_name} LIMIT 10;, which selects the first ten records from the specified table. This modification allows the function to display a sample of the actual data contained within the table.

The code then retrieves a list of all table names in the database by executing SHOW TABLES. It iterates through each table name and calls the show_table function to display the sample data for that table. If any errors occur during the execution of the SQL commands, the code catches these exceptions and prints an error message, allowing the process to continue without interruption. As a result, when this code is executed, it provides a concise view of the first ten records from each table, giving insight into the structure and contents of the HR database.

def show_table(table_name):
    """ Show a table using Display and Markdown """
    # Change the query to select the first 10 records from the specified table
    query = f"""
    SELECT * FROM {table_name} LIMIT 10;
    """
    try:
        fields = pd.read_sql(query, cnx)
        display_dataframe_as_table(fields)

    except Exception as e:
        message = str(e)
        print(f"An error occurred:\n\n{message}\n\nIgnoring and moving on.")

try:
    table_names = cnx.connect().execute(text("show tables"))
    for table_name in table_names:
        show_table(table_name[0])
except Exception as e:
    message = str(e)
    print(f"An error occurred:\n\n{message}\n\nIgnoring and moving on.")
    table_names = pd.DataFrame()
country_id country_name region_id
AR Argentina
      2
AU Australia
      3
BE Belgium
      1
BR Brazil
      2
CA Canada
      2
CH Switzerland
      1
CN China
      3
DE Germany
      1
DK Denmark
      1
EG Egypt
      4
department_id department_name manager_id location_id
         10
Administration
     200
     1700
         20
Marketing
     201
     1800
         30
Purchasing
     114
     1700
         40
Human Resources
     203
     2400
         50
Shipping
     121
     1500
         60
IT
     103
     1400
         70
Public Relations
     204
     2700
         80
Sales
     145
     2500
         90
Executive
     100
     1700
        100
Finance
     108
     1700
employee_id job_id manager_id department_id location_id country_id first_name last_name salary commission_pct department_name job_title city state_province country_name region_name
      103
IT_PROG
     102
         60
     1400
US Alexander Hunold
9000
IT Programmer Southlake Texas United States of America Americas
      104
IT_PROG
     103
         60
     1400
US Bruce Ernst
6000
IT Programmer Southlake Texas United States of America Americas
      105
IT_PROG
     103
         60
     1400
US David Austin
4800
IT Programmer Southlake Texas United States of America Americas
      106
IT_PROG
     103
         60
     1400
US Valli Pataballa
4800
IT Programmer Southlake Texas United States of America Americas
      107
IT_PROG
     103
         60
     1400
US Diana Lorentz
4200
IT Programmer Southlake Texas United States of America Americas
      120
ST_MAN
     100
         50
     1500
US Matthew Weiss
8000
Shipping Stock Manager South San Francisco California United States of America Americas
      121
ST_MAN
     100
         50
     1500
US Adam Fripp
8200
Shipping Stock Manager South San Francisco California United States of America Americas
      122
ST_MAN
     100
         50
     1500
US Payam Kaufling
7900
Shipping Stock Manager South San Francisco California United States of America Americas
      123
ST_MAN
     100
         50
     1500
US Shanta Vollman
6500
Shipping Stock Manager South San Francisco California United States of America Americas
      124
ST_MAN
     100
         50
     1500
US Kevin Mourgos
5800
Shipping Stock Manager South San Francisco California United States of America Americas
employee_id first_name last_name email phone_number hire_date job_id salary commission_pct manager_id department_id
      100
Steven King SKING 515.123.4567 2003-06-17 AD_PRES 24000
     nan
         90
      101
Neena Kochhar NKOCHHAR 515.123.4568 2005-09-21 AD_VP 17000
     100
         90
      102
Lex De Haan LDEHAAN 515.123.4569 2001-01-13 AD_VP 17000
     100
         90
      103
Alexander Hunold AHUNOLD 590.423.4567 2006-01-03 IT_PROG
9000
     102
         60
      104
Bruce Ernst BERNST 590.423.4568 2007-05-21 IT_PROG
6000
     103
         60
      105
David Austin DAUSTIN 590.423.4569 2005-06-25 IT_PROG
4800
     103
         60
      106
Valli Pataballa VPATABAL 590.423.4560 2006-02-05 IT_PROG
4800
     103
         60
      107
Diana Lorentz DLORENTZ 590.423.5567 2007-02-07 IT_PROG
4200
     103
         60
      108
Nancy Greenberg NGREENBE 515.124.4569 2002-08-17 FI_MGR 12008
     101
        100
      109
Daniel Faviet DFAVIET 515.124.4169 2002-08-16 FI_ACCOUNT
9000
     108
        100
grade_level lowest_sal highest_sal
A
    1000
     2999
B
    3000
     5999
C
    6000
     9999
D
   10000
    14999
E
   15000
    24999
F
   25000
    40000
employee_id start_date end_date job_id department_id
      101
1997-09-21 2001-10-27 AC_ACCOUNT
        110
      101
2001-10-28 2005-03-15 AC_MGR
        110
      102
2001-01-13 2006-07-24 IT_PROG
         60
      114
2006-03-24 2007-12-31 ST_CLERK
         50
      122
2007-01-01 2007-12-31 ST_CLERK
         50
      176
2006-03-24 2006-12-31 SA_REP
         80
      176
2007-01-01 2007-12-31 SA_MAN
         80
      200
1995-09-17 2001-06-17 AD_ASST
         90
      200
2002-07-01 2006-12-31 AC_ACCOUNT
         90
      201
2004-02-17 2007-12-19 MK_REP
         20
job_id job_title min_salary max_salary
AC_ACCOUNT Public Accountant
    4200
    9000
AC_MGR Accounting Manager
    8200
   16000
AD_ASST Administration Assistant
    3000
    6000
AD_PRES President
   20080
   40000
AD_VP Administration Vice President
   15000
   30000
FI_ACCOUNT Accountant
    4200
    9000
FI_MGR Finance Manager
    8200
   16000
HR_REP Human Resources Representative
    4000
    9000
IT_PROG Programmer
    4000
   10000
MK_MAN Marketing Manager
    9000
   15000
location_id street_address postal_code city state_province country_id
     1000
1297 Via Cola di Rie 00989 Roma IT
     1100
93091 Calle della Testa 10934 Venice IT
     1200
2017 Shinjuku-ku 1689 Tokyo Tokyo Prefecture JP
     1300
9450 Kamiya-cho 6823 Hiroshima JP
     1400
2014 Jabberwocky Rd 26192 Southlake Texas US
     1500
2011 Interiors Blvd 99236 South San Francisco California US
     1600
2007 Zagora St 50090 South Brunswick New Jersey US
     1700
2004 Charade Rd 98199 Seattle Washington US
     1800
147 Spadina Ave M5V 2L7 Toronto Ontario CA
     1900
6092 Boxwood St YSW 9T2 Whitehorse Yukon CA
region_id region_name
      1
Europe
      2
Americas
      3
Asia
      4
Middle East and Africa

Reflection

  1. Please add a paragraph or two about how this project went.

Getting everything to work on the software side was extremely difficult. I downloaded all of the packages early in the week, and then was unable to get them to run in VS code. Everything worked in the windows terminal, but not in VScode. In the end I needed to run everything through the terminal and only use VS code to edit. Previously in the semester I used the short-cut to compile my qmd files and the instructions also suggested the same thing, this however does not work as it reverts to a different env that does not contain jupyter. It took at least 4 hours for me to be able to render the explore qmd once I had already gotten all of the software to work. Once I was able to render the qmd file, I then could not get the tables to work properly, they would only print the column names but nothing inside of the columns. This happened because when I set up the connection I used my user database instead of the hr one. I also didn’t know if we were supposed to edit the explore file (it was listed as written by Dr. Leonard which typically means do not edit), but I needed to change the select command to the proper database so the connection was the same as the database I was querying.

  1. Was it harder or easier than you expected?

Loading the initial packages was very straight-forward, the website was very helpful. That process was much simpler than I expected. Rendering the qmd file once I loaded the packages was much more difficult than I predicted.

  1. Did the instructor provide too much information or not enough information?

I think he provided a good amount of information spread between lecture and the website.

  1. Do you have suggestions for how it can be improved?

I did not set up a poetry init, I still don’t know if I was supposed to or not. The “How do you know your tools are working together?” section is a little misleading, some of the commands either don’t work with the way we set up the env, or they aren’t explained very well.

  1. How does this assignment relate to project deliverable 5?

Deliverable 5 focused on us defining the scope of our database, including the problem domain and the need for our database. Generating a list of the SQL tables and fields helps determine the problem domain and need for a structured solution. Seeing a previous designed database helps us understand why we need to clearly define the need for our database.